Value-Update Rules for Real-Time Search
نویسندگان
چکیده
Real-time search methods have successfully been used to solve a large variety of search problems but their properties are largely unknown. In this paper, we study how existing real-time search methods scale up. We compare two realtime search methods that have been used successfully in the literature and differ only in the update rules of their values: Node Counting, a real-time search method that always moves to the successor state that it has visited the least number of times so far, and Learning Real-Time A*, a similar real-time search method. Both real-time search methods seemed to perform equally well in many standard domains from artificial intelligence. Our formal analysis is therefore surprising. We show that the performance of Node Counting can be exponential in the number of states even in undirected domains. This solves an open problem and shows that the two real-time search methods do not always perform similarly in undirected domains since the performance of Learning RealTime A* is known to be polynomial in the number of states at worst. Traditional search methods from artificial intelligence, such as the A* method (Nilsson 1971),first plan and then execute the resulting plan. Real-time (heuristic) search methods (Korf 1987), on the other hand, interleave planning and plan execution, and allow for fine-grained control over how much planning to perform between plan executions. Planning is done via local searches, that is, searches that are restricted to the part of the domain around the current state of the agent. The idea behind this search methodology is not to attempt to find plans with minimal plan-execution time but rather to attempt to decrease the planning time or the sum of planning and plan-execution time over that of traditional search methods. (Ishida 1997) gives a good overview of real-time search methods. Experimental evidence indicates that real-time search methods are efficient domain-independent search methods that outperform traditional search methods on a variety of search problems. Realtime search methods have, for example, successfully been applied to traditional search problems (Korf 1990), movingtarget search problems (Ishida and Korf 1991), STRIPS-type planning problems (Bonet et al. 1997), robot navigation and Copyright c 1999, American Association for Artificial Intelligence (www.aaai.org). All rights reserved. localization problems with initial pose uncertainty (Koenig and Simmons 1998), totally observable Markov decision process problems (Barto et al. 1995), and partially observable Markov decision process problems (Geffner and Bonet 1998), among others. Despite this success of real-time search methods, not much is known about their properties. They differ in this respect from traditional search methods, whose properties have been researched extensively. For example, real-time search methods associate values with the states that are updated as the search progresses and used to determine which actions to execute. Different real-time search methods update these values differently, and no consensus has been reached so far on which value-update rule is best. Both (Russell and Wefald 1991) and (Pemberton and Korf 1992), for example, studied several value-update rules experimentally but arrived at different conclusions about which one outperformed the others. This demonstrates the need to understand better how the value-update rules influence the behavior of real-time search methods. In this paper, we investigate how two value-update rules scale up in undirected domains: one that interprets the values as approximations of the goal distances of the states (resulting in a real-time search method called Learning Real-Time A*) and one that interprets the values as the number of times the states have been visited (resulting in a real-time search method called Node Counting). The principle behind Node Counting is simple: always move to the neighboring state that has been visited the least number of times. This appears to be an intuitive exploration principle since, when exploring unknown environments, one wants to get to states that one has visited smaller and smaller number of times with the goal to get as fast as possible to states that one has not visited yet. This explains why Node Counting has been used repeatedly in artificial intelligence. Experimental results indicate that both Node Counting and uninformed Learning Real-Time A* need about the same number of action executions on average to reach a goal state in many standard domains from artificial intelligence. However, to the best of our knowledge, our paper analyzes the performance of Node Counting in undirected domains for the first time, which is not surprising since the field of realtime search is a rather experimental one. We show that Node Counting reaches a goal state in undirected domains Initially, the u-values are zero for all . 1. := . 2. If , then stop successfully. 3. := one-of arg min . 4. Update using the value-update rule. 5. Execute action and change the current state to . 6. := ! . 7. Go to 2. Figure 1: Real-Time Search sometimes only after a number of action executions that is exponential in the number of states, whereas uninformed LRTA* is known to always reach a goal state after at most a polynomial number of action executions. Thus, although many standard domains from artificial intelligence (such as sliding-tile puzzles, blocksworlds, and gridworlds) are undirected, this property alone is not sufficient to explain why Node Counting performs well on them. This result solves an open problem described in (Koenig and Simmons 1996). We also describe a non-trivial domain property that guarantees a polynomial performance of Node Counting and study a probabilistic variant of Node Counting. In general, our results show that experimental comparisons of real-time search methods are often insufficient to evaluate how well they scale up because the performance of two similar real-time search methods can be very different even if experimental results seem to indicate otherwise. A formal analysis of real-time search methods can help to detect these problems and prevent surprises later on, as well as provide a solid theoretical foundation for interleaving planning and plan execution. We believe that it is important that more real-time search methods be analyzed similarly, especially since most work on real-time search has been of an experimental nature so far. Notation We use the following notation in this paper: " denotes the finite set of states of the domain, # $ % &(')%+*," the start state, and -/. 02143 " the set of goal states. The number of states is 5 := 6 "76 . 8:9;#=<>. 0 is the finite, nonempty set of actions that can be executed in state #?*@" . #=A B B 9;# C DE< denotes the successor state that results from the execution of action DF*G8H9 # < in state #I*G" . To simplify matters, we measure the plan-execution times and goal distances in action executions throughout this paper, which is justified if the execution times of all actions are roughly the same. We also use two operators with the following semantics: Given a set J , the expression “one-of J ” returns an element of J according to an arbitrary rule. A subsequent invocation of “one-of J ” can return the same or a different element. The expression “arg min K L M,NO9 P < ” returns the elements PQ* J that minimize NO9 P < , that is, the set R!PS*SJ>6 NO9 P < 0 min K T L M NU9;PWV < X . Node Counting and LRTA* We study two similar real-time search methods, namely uninformed variants of Node Counting and Learning RealTime A* that have a lookahead of only one action execution and fit the algorithmic skeleton shown in Figure 1. (Some researchers feel more comfortable referring to these methods as “agent-centered search methods” (Koenig 1995) and reserving the term “real-time search methods” for agentcentered search methods whose values approximate the goal-distances of the states.) We chose to study uninformed real-time search methods because one of the methods we study has traditionally been used for exploring unknown environments in the absence of heuristic information (often in the context of robot navigation). Both real-time search methods associate a u-value AU9;#=< with each state #Y*Z" . The semantics of the u-values depend on the real-time search method but all u-values are initializedwith zeroes, reflecting that the real-time search methods are initially uninformed and thus do not have any a-priori information as to where the goal states are. The search task is to find any path from the start state to a goal state, not necessarily a shortest one. Both real-time search methods first check whether they have already reached a goal state and thus can terminate successfully (Line 2). If not, they decide on which action to execute in the current state (Line 3). They look one action execution ahead and always greedily choose an action that leads to a successor state with a minimal u-value (ties are broken arbitrarily). Then, they update the u-value of their current state using a value-update rule that depends on the semantics of the u-values and thus the real-time search method (Line 4). Finally, they execute the selected action (Line 5), update the current state (Line 6), and iterate the procedure (Line 7). Many real-time search methods from the literature fit this algorithmic skeleton. We chose to compare Node Counting and Learning Real-Time A* because both of them implement simple, intuitive rules of thumb for how to interleave planning and plan execution. Node Counting can be described as follows: Node Counting: A u-value AU9;#=< of Node Counting corresponds to the number of times Node Counting has already been in state # . Node Counting always moves to a successor state with a minimal u-value because it wants to get to states which it has visited a smaller number of times to eventually reach a state that it has not yet visited at all, that is, a potential goal state. Value-Update Rule of Node Counting (Line 4 in Figure 1)
منابع مشابه
Effectiveness of A* local search space in real time heuristic search
Real time heuristic search is currently an active area of research. LRTA* [5] was the first algorithm in this field and various extensions and modifications of it have been proposed such as RTAA* [4] and LRTS [2]. The central idea behind most of these algorithms is to perform a search in a local search space starting at the current position, updating the heuristic value at the current state and...
متن کاملEscaping heuristic depressions in real-time heuristic search
Heuristic depressions are local minima of heuristic functions. While visiting one them, real-time (RT) search algorithms like LRTA∗ will update the heuristic value for most of their states several times before escaping, resulting in costly solutions. Existing RT search algorithm tackle this problem by doing more search and/or lookahead but do not guide search towards leaving depressions. We pre...
متن کاملReal-Time Heuristic Search with Depression Avoidance
Heuristics used for solving hard real-time search problems have regions with depressions. Such regions are bounded areas of the search space in which the heuristic function is exceedingly low compared to the actual cost to reach a solution. Real-time search algorithms easily become trapped in those regions since the heuristic values of states in them may need to be updated multiple times, which...
متن کاملروش جدید متنکاوی برای استخراج اطلاعات زمینه کاربر بهمنظور بهبود رتبهبندی نتایج موتور جستجو
Today, the importance of text processing and its usages is well known among researchers and students. The amount of textual, documental materials increase day by day. So we need useful ways to save them and retrieve information from these materials. For example, search engines such as Google, Yahoo, Bing and etc. need to read so many web documents and retrieve the most similar ones to the user ...
متن کاملInteractive and Deterministic Data Cleaning A Tossed Stone Raises a Thousand Ripples
We present Falcon, an interactive, deterministic, and declarative data cleaning system, which uses SQL update queries as the language to repair data. Falcon does not rely on the existence of a set of pre-defined data quality rules. On the contrary, it encourages users to explore the data, identify possible problems, and make updates to fix them. Bootstrapped by one user update, Falcon guesses a...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 1999